ON THE RANDOM SAMPLING OF PAIRS, WITH 
PEDESTRIAN EXAMPLES 

RICHARD ARRATIA AND STEPHEN DESALVO 

Abstract. Suppose one desires to randomly sample a pair of ob- 
jects such as socks, hoping to get a matching pair. Even in the sim- 
plest situation for sampling, which is sampling with replacement, 
the innocent phrase "the distribution of the color of a matching 
pair" is ambiguous. One interpretation is that we condition on the 
event of getting a match between two random socks; this corre- 
sponds to sampling two at a time, over and over without memory, 
until a matching pair is found. A second interpretation is to sam- 
ple sequentially, one at a time, with memory, until the same color 
has been seen twice. 

We study the difference between these two methods. The input 
is a discrete probability distribution on colors, describing what 
happens when one sock is sampled. There are two derived distri- 
butions — the pair-color distributions under the two methods of 
getting a match. The output, a number we call the discrepancy of 
the input distribution, is the total variation distance between the 
two derived distributions. 

It is easy to determine when the two pair-color distributions 
come out equal, that is, to determine which distributions have 
discrepancy zero, but hard to determine the largest possible dis- 
crepancy. We find the exact extreme for the case of two colors, by 
analyzing the roots of a fifth degree polynomial in one variable. We 
find the exact extreme for the case of three colors, by analyzing the 
49 roots of a variety spanned by two seventh-degree polynomials 
in two variables. We give a plausible conjecture for the general sit- 
uation of a finite number of colors, and give an exact computation 
of a constant which is a plausible candidate for the supremum of 
the discrepancy over all discrete probability distributions. 

We briefly consider the more difficult case where the objects 
to be matched into pairs are of two different kinds, such as male- 
female or left-right. 
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1. Motivation 

The problem that inspires us: Suppose a drawer has 12 white and 
4 black socks. How many socks must one remove to ensure a pair of 
matching color? The answer, 3, illustrates the pigeon-hole principle. 
The statement of detailed counts, 12 and 4, was arbitrary, but leads to 
the problem that we address in this paper: what is the distribution of 
the color of a matching pair? 

To simplify, we take the limit as the number of socks in the drawer 
goes to infinity, while the proportions remain constant, e.g., seventy 
five percent white and twenty five percent black. 

We consider two sensible methods for choosing "a matching pair." 

(Ml) Select objects two at a time until a pair of the same color is 

selected in a single round; 
(M2) Select objects one at a time until the first pair of the same color 

is found. 

For a second example, if there are 365 equally likely colors for socks, 
then, under Method 2 the maximum number of socks inspected is 366, 
but the expected number is 23.6166... Q In contrast, the expected 
number of pairs inspected by Method 1 is exactly 365, hence the ex- 
pected number of socks inspected is 730. However, our focus is not on 
the number of socks inspected, but rather, on the distribution of the 
color of the matching pair. 

^he exact computation is EN = J2k>a V ( N > k ) = Efct 4 ( 365 W 365fc > with 
the notation (n)k = n\j(n — k)\ for n falling k. 
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In our first example, under method 1 the odds for a white pair over a 
black pair are (12/16) 2 to (4/16) 2 ; equivalently 12 2 to 4 2 , or 3 2 to l 2 , so 
that 9/10 of the time the pair is white, and 1/10 of the time it is black. 
Under method 2, the outcomes resulting in a white pair correspond to 
ww,bww,wbw, with total probability (.75) 2 + 2(.75) 2 (.25) 2 = 27/32, 
and the outcomes resulting in a black pair correspond to bb, wbb, bwb, 
with total probability (.25) 2 + 2(.75)(.25) 2 = 5/32. 

To summarize, the input is a distribution on colors, p = (.75, .25), 
and there are two outputs: under Method 1, the color of a pair is 
white with probability .9, and black with probability .1, while under 
Method 2, color of a pair is white with probability 27/32, and black 
with probability 5/32. 

p = (.75, .25) 
Ml(p) = (.9,.l) 
M2(p) = (.84375,. 15625). 

Some natural questions, for an arbitrary discrete distribution p for 
the color of a single sock: 

(Ql) When does Ml(p) = M2(p)? 

(Q2) How far apart can Ml(p) and M2(p) be from each other? 

There are practical algorithms pQ for sampling, exploiting the birth- 
day paradox, that require getting a matching pair whose color has the 
distribution (Ml), but under a naive opportunistic implementation, 
would only find a pair whose color is distributed according to (M2). 
Question (Q2) above is about quantifying the error that would result 
from using the opportunistic implementation. 

2. Pair-derived distributions 

In general, we write S for the random color of a single sock, and 
describe the initial distribution of colors with 

When the number of colors is finite, say n + 1, then we let the colors be 
0, 1, 2, ... ,n, and the distribution of S is given by p = (po,Pi, ■ ■ ■ ,Pn)- 
Our initial example had n + 1 = 2, p = (po,Pi) = (.75, .25). When 
the number of colors is infinite, we take the colors to be 0, 1, 2, . . ., and 
then p = (p ,Pi,P2, ■ • •)• 

Method 1 may be described as the color X of a pair of randomly 
chosen socks, conditional on getting a match. More precisely, the two 
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chosen socks have colors S and S' and are independent and identically 
distributed, with P< = F(S = i). We write 

(1) h ■.= f(s = s') = j2ns = s' = l ) = j> 2 

i i 

for the probability that two randomly chosen socks match, so 

2 

(2) F(X = i)= F(S = i\S = S') = V -f. 

h 

Method 2 involves a sequential procedure: pick socks one at a time 
until a duplicate color is found. Suppose that when this duplicate is 
found, there have been k other colors, with k = 0, 1, 2, . . .. Write % for 
the duplicate color, and J = {ji, . . . ,jk} for the single colors, so that 
i ^ J and \J\ — k. The second occurrence of color i is at time k + 2, 
and for the first k + 1 socks, any permutation of the colors in {i} U J is 
valid. Hence the color Y of the matching pair found by Method 2 has 
distribution given by 

(3) F(Y = i)=pfJ2(k+iy.^Pn---P^ 

k j 

In the sum above, \J\ = k and i ^ J. 



3. When are the two pair-picking methods the same? 

A discrete distribution is said to be uniform if it has finite support, 
say of size n+1, and for each color i in the support, pi = l/(n+l). It is 
easy to see that if p is uniform, then Ml(p) = M2(p)J^] The converse 
is true, but not so easy to prove; we will first prove an ancillary result 
in Lemma Q] and then summarize in Theorem Q] 

Lemma 1. Under Method 2, as specified by rt3J) ; 

, FCY = i) FCY = j) 

(4) if Pl > Pj > 0, then 1 2 ; < 1 2 J) , 

Pi Pj 

hence 

(5) if pi =Pj>0 then F(Y = i) = F(Y = j). 



2 Because, in fact, if p is a uniform distribution, then both Ml(p) and M2(p) 
are equal to the original uniform distribution — by the principle of ignorance, 
all possible colors are alike, and hence, equally likely under each of the derived 
methods. We invite the reader to consider, is "principle of ignorance," i.e. invoking 
symmetry, without presenting details as in an adequate proof? 
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Also, 

, ¥(Y = i) ¥(Y = j) 
(6) if pi > pj > 0, then — — - 2 — - < — — 



Pi Pj 

Proof. Assume pi > pj > 0. Define t(i, k) to be the inner sum of ([3]), 
so that 



^^ = 5>+l)!t(i,fc). 



To prove Q it suffices to show that if pi > pj > then t(i,k) < t(j, k) 
for all k, and to further prove ([6]), it suffices to show that if pi > pj 
then t(i, k) < t(j, k) for at least one k. With sums always taken over 
sets of size k, 

t(i,k) = ^2p h ---p ik = y Pii---Pi k + ^Ph---Piki 

that is, in the sum over sets J excluding i, we take cases according to 
whether or not j G J. With a similar decomposition of t(j, k), taking 
the difference yields 

t{i,k) -t{j,k) = Ph'-'Pik- Yl Pif-Pik- 

There is a bijection between sets J for the first sum and sets J for 
the second sum, that substitutes i for j. From pi > pj it follows 
that for all k, t(i,k) < t(j,k), and further, when pi > pj, we have 
t(i,l)<t(j,l). □ 

Theorem 1. Over all discrete distributions p, the derived distributions 
of X and Y , given by ^ and are equal if and only ifp is a uniform 
distribution. 

Proof. Assume first that p is a uniform distribution, say over n + 1 
colors, so that for all i,j in the support of p, we have pi = pj = 
l/(n + 1). For i,j both in the support of p, it is obvious from ^ 
that pi = pj implies P(A = i) = F(X = j), and ^ shows that 
P(y = i) = P(y = j). Hence for i in the support of p, F(X = 
i) = l/(n+ 1) = P(y = i), implying that X and Y have the same 
distribution. 

To prove the opposite direction, suppose p is not a uniform distri- 
bution. Then we can fix i,j with pi > pj > 0. From Q, we get 

p(y = i) p(y = j) 



< 



p 2 i p) 
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and dividing by fa to relate with Q, and rearranging, 

F(X = i) > F(Y = i) 



F{X = j) F(Y=jY 
which implies that X and Y have different distributions. □ 

Theorem [T] gives a complete answer to our first question: when are 
the two pair-picking methods the same? Next we turn to the second 
question: when the two methods are different, how different can they 
be? 

4. Total variation distance 

We wish to quantify: given a probability distribution p, with the 
matching pair chosen by Method 1 or Method 2, how far apart are the 
two distributions with respect to the color of the matching pair? 

A natural metric on the space of all probability measures is the total 
variation distance. 

Definition 1. For two real-valued random variables X and Y , the total 
variation distance between the laws of X and Y is defined as follows. 

d TV {C(X), C(Y)) = sup \P(X eA)- P(Y G A)\, 



where the sup is taken over all Borel sets A C IRQ 

It is common to write d-ry(X, Y) instead of dTv{£{X), £(Y)). 

Definition 2. Given a discrete probability distribution p, let X have 
the Method 1 distribution given by (|2| ; let Y have the Method 2 distri- 
bution given by and define the discrepancy of p by 

(8) D(p)=d TY (X(p),Y(p)). 

We could have written D(p) = dTy(X,Y) above, but we prefered 
d^y(X(p),Y(p)), to emphasize that D(p) is the total variation dis- 
tance between two probability laws, with each law being a function of 
a third underlying law p. 

Some elementary facts about total variation distance: When X and 
Y are discrete random variables, an equivalent definition is 

(9) d TV (x, Y) = 1 -j2 i p ( x = k ) - ¥{ y = *oi> 



3 This choice of definition is useful for probability, with the desirable property 
that d,Tv{X,Y) < 1, and it equals sup^. K ^[ 01 ] \Ef(X) — E/(F)|. But there is 
an alternate tradition, from analysis, to define the total variation distance be- 
tween measures \i,v as sup^.j l _ > r_ 1)1 ] ] J fd/j, — J fdv\, which, when applied to 
/i = C(X), v = £(Y), gives values ranging from to 2. 
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and it follows, from J2 k F ( x = k ) = E fc p (T = k \ that b Y splitting 
up the summands into positive and negative partsQ 

Lemma 2. 

(10) d TV (X,Y) = ^(P(X = A;)-P(F = fc)) + 

k 

= ^(P(x = k) -p(y = fc))-. 

For example, when X is a Bernoulli random variable with parameter 
and Y is Bernoulli with parameter 9', the total variation distance 
is|0-0'|. 

5. Special Cases 

5.1. Dimension n = 1: two colors of socks. In the case n = 1, we 
write p = (po,Pi) = 1 — %)■ The discrepancy -D(p) = d-ry(X, Y) 
simplifies, via Lemma [5J to |g?i|, where 

dt(x) = ¥(X = 0) - P(T = 0) = ^ r^-(x 2 + 2(l-x)x 2 ). 

+ (1 — x) z 

The expression is plotted in Figure [T] 

Since d\ is a rational function in one variable, it is easily optimized 
over x G [0,1]. We outline our procedure as a preparation for the 
more difficult case in Section 5.2| We first put the derivative over a 



common denominator, which is strictly positive for < x < 1, and 
focus our attention on the numerator. The numerator is a sixth degree 
polynomial in x of the form 4 (—x + 7x 2 — 18x 3 + 24x 4 — 18x 5 + 6a; 6 ), 
having four real roots: 0,1, 



(11) x x ■= - f3 + y 3 (-3 + 2^) J = 0.696660, 

and the conjugate, 1 — xi. The list of roots already includes both 
endpoints of the domain [0,1]. The cusp for at x — 1/2 is 

also critical, with |<i 1 (l/2) = 0| corresponding to the uniform case. 
Evaluating at these five critical numbers exhausts all possible 

extremes, and the maximum value is dAxi) = , 1 = 0.0608468. 

A/135+78V3 



4 Notation: t + = max(0, t),t = max(0, —t); hence \t\ = t + + t and t = t + —t 
5 so that F(X = 1)=0 = 1- ¥(X = 0) 
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Figure 1. Plot of D(p) for p = (x, 1 — x), as a function 
ofx€[0,l]. 

5.2. Dimension n = 2: three colors of socks. The case n = 2 
can be set up similarly to n = 1, but now we have three cases of 
possible signs underlying absolute values. Each case is a smooth, two- 
dimensional surface, and we find extremes by checking all critical values 
arising from points where the gradient vanishes, and on the boundary. 
To avoid subscripts, we switch notation from p = (po,Pi,P2) to p = 
(a, b, c), and define 

f(a, b, c) := a 2 (l + 2(6 + c) + 66c), 

a 2 

cr + + cr 

so that when p = (a,b,c), with a being the probability that a sin- 
gle sock has color 0, T(a,b,c) = F{X = 0) - F(Y = 0). Note that 
T(a,b,c) = T(a,c,b). Exchanging the roles among colors 0, 1, 2, 
we have T(b,a,c) = F(X = 1) - F(Y = 1) and T(c,a,b) = F(X = 
2) - F(Y = 2). From Definitions [X] and § when p = (a, 6, c), 

2,D(p) = \T(a,b,c)\ + \T(b,a,c)\ + \T(c,a,b)\. 

The expression above has the form |Ti| + \T2 \ + |CZ~3 ( , and the absolute 
value function is an obstacle to taking the gradient. But by taking the 
eight cases for the sign, each of the expressions ±7i ± T 2 ± T 3 is a 
rational function. 
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A straightforward parameterization of the two-dimensional set of 
probabilities (a, b, c) would have a > 6 > 1 — a — b > 0, implying that 
T\ > and T 3 < 0, so that there are only two cases, according to the 
sign of T 2 . A major obstacle to this approach is the boundary, which 
is complicated, so instead we parameterize in terms of (x,y) G [0, l] 2 
as follows: 

p(x, y) = (a, b, c) where t = 1 + x + y, a = -,b = j,c = ^. 
Now taking a = a(x,y) and so on, we have three functions defined on 

[0, i] 2 , 

Ti(x,y) : = T(a,6,c), 

T 2 (x,y) : = T(6,a,c), 

T 3 (x,y) := T(c,a,b). 

The total variation distance is given by 

(12) 2d TY (X,Y) = {Tfay)] + \T 2 (x,y)\ + \T 3 (x,y)\. 

Since 1 > x, y, we have a > b,c and since the largest mass is at 1, we 
know that for all x, y G [0, 1], Ti(x, y) > 0. 

We can eliminate the case T\ > 0, T2 > and T3 > 0, as this implies 
T\ — T 2 + T 3 = since T x + T 2 + T 3 = 0. By Lemma [2] this case gives 
D(p) = 0, not of interest in the search for the maximum value. There 
are three remaining cases of sign to consider. Let 

di(x,y) = T 1 (x,y)+T 2 (x,y)-T 3 (x,y), 
d 2 (x,y) = T x (x,y)-T 2 (x,y)+T 3 (x,y), 
d 3 (x,y) = Ti(x,y)-T 2 (x,y)-T 3 (x,y). 

Then maxdTy(X,Y) = m.ax(dx,d 2 ,d 3 ), and so it suffices to check the 
maximum values of each of these rational functions. 

Let us consider g(x,y) := dx(x, y)Jj Since g is a rational function 
in two variables, it is elementary to calculate the partial derivatives 
with respect to x and y, denoted g x and g y , respectively. What is not 
so elementary is finding all solutions (x,y) to the system g x (x,y) = 
g y (x,y) = 0. This set, V(g x ,g y ) := {(x,y) : g x = g y = 0}, also known 
as the affine variety defined by g x ,g y , is what we wish to find; a good 
introductory text on this subject is [3]. 



The term di becomes d\ under the interchange of x and y, so no further work 
is required for d>2- For ^3, the corresponding h x and h yi after cancellation of a 
common factor, have total degree 6 each, and one must account for the 36 solutions 
guaranteed by Bezout's Theorem. 



10 



ARRATIA AND DESALVO 



Continuing with this example, even though g x and g y are rational 
functions, when each is rationalized it is clear that for x, y > the de- 
nominator is always positive, and hence plays no role in characterizing 
the set of points in the variety V(g x , g y ) fl [0, l] 2 . Thus we may simply 
find the variety of the numerators restricted to [0, l] 2 , denoted h x and 
h y , respectively, which are bivariate polynomials. 

A generalization to the Theorem of Algebra due to Bezout (see for 
example Chapter 5, Section 7 of [3]) can be used to verify that all so- 
lutions have been founcQ In this case, after dividing out by a common 
factor of x, the two polynomials each have total degree 7. Bezout's the- 
orem guarantees 7 x 7 = 49 solutions total including multiplicities, but 
some of these are solutions "at infinity. ''[^Mathematica® finds a set of 
19 unique, easily-verified solutions; when including multiplicities, this 
accounts for 39 of the total solutions. By hand we can find 10 solutions 
at infinity, so all 49 solutions have been addressed. 

We obtain the largest value of g?tv from the point (x, y) given bwj 



x G (0, 1) : 1 + Ax - Ux 2 - 4x 3 - 34x 4 + 20x 5 = 0, 

y ■ y = x, 

2d TV = z e (0, 0.2) : 32000 + 168192,2 



The precise form of the theorem requires several definitions and is not intended 
to be the focus; instead, we merely require assurance that the solutions found by 
Mathematica® [5] are exhaustive, since they are easily verified. 

8 Hcre is a simple analogy: How many times will a parabola intersect a line? A 
parabola has degree 2 and a line has degree 1. Suppose our parabola is y = x 2 : then 
if our line is 1) y = x — 1, then there will be no intersections; 2) y = 0, then there is 
one intersection of multiplicity 2; 3) y — x, then there are two unique intersections 
of multiplicity 1 each; 4) x = a, for any real a, then there is one intersection of 
multiplicity 1. By using an appropriate transformation into the projective plane, 
one can guarantee exactly two solutions in all cases. 

9 The Mathematica® expressions are 



(13) 



- 45576002 2 + 14567472^ 3 

- 8215832 4 + 314928z 5 = 0. 



This solution is of the form 




x = Root [1 + 4#1 - 14#1 2 - 4#1 3 - 34#1 4 + 20#1 5 &, 2] , 

d TV = iRoot[32000 + 168192#1 - 4557600#1 2 + 14567472#1 3 
-821583#1 4 + 314928#1 5 &, 2]. 
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for the value of x<i G [0.5, 0.6] that solves —5 
0, with 



153^ + 54^ 
(14) 



42z 2 - 114x2 + 168x2 
x 2 = 0.582011, D(p) = 0.0842942; 



the exact value of D(p) given by Equation (13). 



6. Conjectures about the largest possible discrepancy 

The weakest conjecture is that there is some nontrivial upper bound 
on discrepancy. Formally, we define the universal constant for the pair 
discrepancy by 

(15) £ :=supD(p), 

p 

where the supremum is over all distributions p on a finite or countable 
set of colors. Since total variation distance is always less than or equal 
to 1, trivially £ < 1, and the conjecture is 



Conjecture 1. The constant defined by (15) is strictly less than 1, 
i.e., 



(16) 



< 1. 



6.1. Conjectures for a finite number of colors. If there are a finite 
number of colors, say n + 1 with n > 0, then we can relabel the colors 
as 0, 1, . . . , n so that p = (po, ■ ■ ■ , p n ) with 

(17) po > Pi > ■ ■ ■ > p n > 0, P0+P1 + ■■ 

Given n > 0, and x G [^tj, 1), let 

1 — x 1 — x 



Pr, 



1. 



x, 



n 



n 



which, due to x G [^pf, 1), satisfies (17). 

With the notation (18), the result of Section 5.2 may be summa- 
rized as: for n = 2, over all probability distributions onn + 1 colors 



standardized to satisfy (17), the maximum value of D(p) is achieved, 
uniquely, at p = p(2,x), with specified by (LL4|). 



For each n > 0, (18) defines a one parameter family of probabil- 
ity distributions. At the endpoint x = l/(n+ 1), p(n, x) is a uni- 
form distribution. Now suppose that x G (l/(n + 1),1), so that 
p(n,x) has po > Pi = P2 = • • • = Pn > 0. It is obvious from (|2]) 
that F(X = 0) > P(X = 1) = • • • = P(X = n) > 0, and Lemma [j] im- 
plies that F(Y = 0) > P(V = 1) = • • • = P(Y = n) > 0. That is, 
both X and K have distributions in the same one parameter family. 
Finally, Q implies that P(X = 0) > F(Y = 0), while for i = 1 to 



12 



ARRATIA AND DESALVO 




Figure 2. D(p) for the one parameter families (18) 



n 



1 to 9. For each n, we plot 



71+1 



x 



versus 



D(p(x,n)), so that all 9 graphs have domain [0,1]. 



n, ¥(X = i) < P(y = z), and hence using (10), for each n > and 
a; G 
ancy, 



-n+l 



l)j P — p( n i x ) has the simplified expression for its discrep- 



D(p) =F(X = 0) -F(Y = 0) 



(19) 



x 



x 2 + 



71 fc=0 \ / \ 



1 — X 



n 



Conjecture 2. For et'ery nonnegative integer n, among all probability 
distributions on n + l colors, the maximum value of D(jp) is achieved 
by a distribution of the form p(n,x n ). 

A slightly stronger conjecture is the following: 

Conjecture 3. For every nonnegative integer n, among all probability 
distributions on n + 1 colors, the maximum value of D(jp) is achieved 
uniquely by p(n,x n ), where x n = argmax^. D(p(n, x)). 

We cannot prove Conjecture [2j but we believe it to be true, for the 
following reasons. 



(1) It is true, trivially for n = and n = 1, and by Section 5.2, for 
n = 2. 
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(2) By broad analogy, many symmetric payoff functions achieve 
their extreme values at points with lots of symmetry. Indeed, 
Theorem [l] asserts that for each n, -D(p) achieves its minimum 
value, zero, at the uniform distribution, corresponding to the 



maximum conceivable symmetry in p, while the family in (18) 



corresponds to breaking symmetry somewhat, but as little as 
possible. 



(3) The one parameter family ( 18 ) shows up in other extremal prob 



lems which share the feature that the labels on the colors are 
irrelevant, and only the values of the probabilities matter. In 
particular, in information theory, the one parameter families 
show that "Fano's inequality is sharp;" see Cover and Thomas 
[2], (2.135) on page 40. 
(4) For the moderate values n = 3,4, . . . , 8, when generating a 
million random points from the n-dimensional region specified 



by (17), the largest observed -D(p) in the sample came from a 



p that was close, by eye, to the form of (18) 



The table below summarizes approximate extreme values under the 
one parameter families (18) for n — 1, . . . , 9, using the notation x n = 
argmax a .D(p(n, x)). 



x x = 0.6966599465951643196 D{x x ) 

x 2 = 0.5820110139097399105 D(x 2 ) 

x 3 = 0.5160030571683498864 D(x 3 ) 

x 4 = 0.4710812367633940106 D(z 4 ) 

x 5 = 0.4376598564845561514 D(x 5 ) 

x 6 = 0.4113811479448445739 D(x 6 ) 

x 7 = 0.3899258770101118464 D(x 7 ) 

x 8 = 0.3719239304877958135 D(x 8 ) 

x 9 = 0.3565033913388721410 D(x 9 ) 



0.06084679923181354776 
0.08429419234614604446 
0.09766297359542326758 
0.10661363736945495196 
0.11316011048732238932 
0.11822473613430355437 
0. 12229838762442936532 
0.12566994796517442344 
0.12852218802677888163 



Figure p\ shows, for n = 1 to 9, D(p(x,n)) for x £ [^zij l]j the graph 
plots r ^-x — - versus D(p(x,n)), so that all 9 graphs use the same 
domain, [0,1]. 



7. Limit analysis of the one parameter family 
Theorem 2. For c G (0, oo) define 

■2 



(20) 



1 + c 2 
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For any c G (0, oo) and n > 1/c 2 , let = p(n, cjy/n) be the 



distribution governed by (18) with x = c/y/n. Then 



(21) lim £>(p (n) ) = £{c), 



where I is defined by (29). 



Proof. Extend Method 2 beyond the time of the first matching pair; 
i.e., pick socks forever. For each color i let iVj be the number of sock 
picks needed to get the second sock of color i. As the color varies, these 
random variables are dependent, since for any two distinct colors i,j 
and time n > 2, = F(N t = Nj = n) < P(iVj = n) F(Nj = n). There 
is a standard technique to deal with this dependence, used in Markov 
chaina^J which is to take a sequence of independent exponentially dis- 
tributed holding times Y 1 ,Y 2 , . . ., with F(Y n > t) = e - *, and declare 
that the nth sock arrives at time Y\ + Y 2 + ■ ■ ■ + Y n \^\ With values in 
(0,oo), the time Tj at which color i is first seen for the second time 
can be expressed as Tj = Yi + ■ ■ ■ + Y N . . The distribution of the color 
of the first matching pair found, initially specified by ([3]), can also be 
expressed as 

P(y = %) = P(T { < mini}). 

For each color i, the times at which socks of color i arrive form a Poisson 
arrivals process with rate Pi, and as the color varies, these processes 
are mutually independent; in particular the second arrival times are 
mutually independent. 

We are considering socks distributed according to p(n,c/y / n), that 
is, with y := (1 — c/y/n), 



(22) p = cjyfn.px = y/n,p 2 = y/n, ...,p n = y/n. 



Speed up time by a factor of y/n] now socks of color arrive at rate 
c, and for each other color i — 1 to n, socks of color i arrive at rate 
Pi\/n — y/y/n- F° r t > 0, and for each i — 1 to n, the number Z of socks 
of color i collected by time t is Poisson with parameter A = ty/\/n, 
and the event {Tj > t} is the event {Z < 2} = {Z = or 1}, with 



see for example jS]- 

llr rhe number of socks picked by time t is thus Poisson distributed, with mean 
t. Write Ci(t) = the number of socks of color i chosen by time t. As i varies, the 
counts Ci(t) are mutually independent; this observation is known as Poissonization. 
See exercise XII. 6. 3 in Feller 0]. 
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probability 

P(T 4 >t) = F(Z = 0) + P(Z = 1) 



cxp 



(1 + A) 



v n / \ \/n 

2„,2 



= i_^ + (n- 3 / 2 
2n v 

The easy way to see the result above is to argue that A is small, so 
e~ A (l+A) = (l-A + A 2 /2-A 3 /6 + ---)(l + A) = 1 -A 2 + A 2 /2 + 0(A 3 ) = 
1 - A 2 /2 + 0(A 3 ). 

The event {min(Ti, . . . ,T n ) > t} is the intersection of the events 
{Ti > t}, so using the mutual independence, together with y — > 1, 

(.2 2 \ n 

1 - -1- + 0{n^ 2 ) 

-)> exp(-t 2 /2). 

Finally, we argue that the density of T , the second arrival time in a 
Poisson process with rate c, is given by 

/(f) = c 2 te~ ct . 

This is a standard fact, known to some as the density of the Gamma 
distribution with shape parameter 2 and scale parameter c. Using the 
independence of T and min(Ti, . . . , T n ), we can condition on the value 
t for T to get 

P„(F = 0) = P(miB(Ti,...,r ft )>To) 

/■oo 

= / P(min(Ti,...,T n ) >t)f(t) dt 
Jo 

POO 

= / P(min(T 1 , . . . , T n ) > t) c 2 te- ct dt 
Jo 

> I ' c 2 te- ct e~ t2 / 2 dt. 



o 



The above amounts to a calculation of the limit, as n — > oo, of 
P n (y = 0), corresponding to Method 2 when the underlying colors 
come from (22). For Method 1 the calculation is easier: using Q we 



have /2 = Pq+ pi + • ■ ■ + y\ = (c/ y/n) 2 + n(y/n) 2 = c 2 /n + y 2 /n and 



/ 2 c 2 /n + y 2 /n c 2 + y 2 c 2 + 1 
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At (19) we had already argued that once n is large enough that 



Po > pi we have the simplification, for our one parameter family, that 
D(pW) = p n (X = 0) - F n (Y = 0). Combining this calculation of 
D(p^) with the limit values derived for F n (Y = 0) and P n (X = 0), 

follows. □ 



We note that instead of invoking Poissonization, as in the above 



'n and k 



proof, one can argue directly with the explicit expression in (|19|), to 
show that under x = c/\ 
approximation for J a 



ty/n, the sum in (19) is a Riemann 



8. Discussion 



Figure 3. Plot of c versus £(c) for c = to 10. The 
maximum occurs at Co = 1.514 and has value £(cq) = 
0.18320. 



If Conjecture [2] is true, it will follow that Conjecture [T] is also true, 
with the value of the universal constant for a pair of socks given by 



(23) 



£ = sup£(c) = 0.1832000624087106. 



The argument requires two parts. The first part is to show that 



defined in (15) as the sup of D(p) over all discrete distributions, is 
equal to the sup over distributions with finite support. This is "soft" 
analysis, showing first that p \-t D(p) is continuous, hence given p 
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with discrepancy greater than £q — e we can find a nearby distribution 
p' with finite support, close enough to p to guarantee that its discrep- 
ancy is greater than £q — 2e. The second part, giving the concrete value 
for £ , uses compactness: given distributions p( n ) = p(n,x n ) with dis- 
crepancies converging to £q, the values c n := x n y/n e [0, oo], n > 1, lie 
in a compact set, and hence there must be convergent subsequences. If 
c nk — > c and c G (0, oo), then the proof of Theorem [2] already shows 
that the associated discrepancies converge to £(c ). If c nk — > c with 
Co = or Co = oo, a small extension of the proof of Theorem [2] would 
show that the associated discrepancies would converge to 0. So indeed, 
c n ->■ c and D(pW) £{c ). 

9. Shoes instead of socks: a matching left-right pair 

Suppose, instead of wanting to collect a pair of matching socks, we 
want a pair of matching shoes. Naturally, this means one left shoe, and 
one right shoe, both of the same color. There are two reasonable ways 
to extend our study to this situation. 

9.1. One distribution for left colors, another distribution for 
right colors. The setup here involves two discrete probability distri- 
butions, say p for the color S of a left shoe, and q for the color S' of a 
right shoe. The analog of ([I]) is 

(24) h = ns = s f ) = j2ns = s' = t) = Y,Pi<n 

i i 

for the probability that a random left shoe and a random right shoe 
match. We require that for at least one value i, piqi > 0. The analog 
of (j2]) is the Method 1 distribution for the color X = X(p,q) of a 
matching left-right pair 

(25) P(X = i) = F(S = i\S = S') = 

J2 

For method 2, we assume that at times 1,3,5,. . ., one left shoe is 
collected, and at times 2,4, 6, . . ., one right shoe is collected. Suppose 
that at time k — 1, there is not yet a matching left-right pair, but at 
time k, there is; then Y = F(p,q) is the color of the shoe collected at 
time £;E3 

12 There are other sensible ways to determine the matching color under sequential 
collection of shoes, for example, selecting one left and one right shoe each at time 
1, 2, 3, . . . and breaking ties via a coin flip. Even here, choices remain. For example, 
if the outcome is L\ — red, R\ — blue, L2 — red, R2 — white, L3 = white, R3 = 
red, then the tiebreak might be specified as equal odds for white versus red, or, 
since the available matches at time 3 are (Li,R 3 ), (L 2 , R3), and (L 3 , R 2 ), as 2 to 1 
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The analog of discrepancy is now 



(26) J D(p,q) = rf T v(X(p,q),F(p,q)). 

It is fairly easy to see that for this situation, the analog of Conjecture 
[T]is false; that is, the supremum of the discrepancy over all pairs of dis- 
tributions is no smaller than the trivial upper bound on total variation 
distance: 

(27) l = su Pj D(p,q). 

p,q 



We give a brief sketch of a proof of (27): with a = a(n) = n x / 4 



and b = b(n) = n 2//3 let p = p(n, a) and q = p(n,b); in other words, 
p = p(5 = 0) = a, q = F(S' = 0) = b and for % = 1 to n, p { = F(S = 
i) = (1 - a) In, qi = P(S" = i) = (1 - b)/n, with a = n" 1 / 4 , b = . 
We have p q = n~ ll l 12 and 

El — a 1 — 6 1 
p { qi = n = o{p q ), 
n n n 

i=l 

so the Method 1 distribution converges to point mass at color 0, i.e., 
P n (X = 0) — > 1. To see that the Method 2 distribution has, in the limit, 
probability zero of getting color 0, consider collecting alternately left 
and right shoes forever. At time m = 2n 5 / 8 , we will have collected n 5 / 8 
left and n 5//8 right shoes. Thanks to the small value qo = b = n~ 2 / 3 , 
we expect only n~ 1//24 left shoes of color at time m, so with high 
probability, we do not yet have a matching pair of color 0. But, at 
time m, for each color i — 1 to n, the number of left shoes of color i is 
Binomial(m, (1 — a)/n), and hence is greater than zero with probability 
asymptotic to m/n ~ n~ 3 / 8 . Independently, the number of right shoes 
of color i is greater than zero with probability asymptotic to n^ 3 ^ 8 ; 
hence the probability of at least one pair of color i is asymptotic to 
n -3/4_ rpj^ jjyjjjkgj. Q f co i ors { > o for which we have a pair has 

E,W ~ n 1 / 4 , and the n events are negatively correlated with each 
other, so Variy < EW. By Chebyshev's inequality, F(W = 0) < 
VarW-y(Eiy) 2 = 0(n -1 / 4 ). So at time m, we are unlikely to have any 
pair of color 0, and unlikely not to have at least one pair of some other 
color, hence F n (Y = 0) ->■ 0. 



in favor of red over white. For this outcome, our specification in the the main text 
is white, since the earliest match occurs at time 5, when L 3 = white is observed. 
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9.2. With the constraint p = q. Now suppose that we declare that 
the distribution p for left shoes and the distribution q for right shoes 
must be equal. This does not reduce consideration of the distribution 
of a matching pair to the situation for socks; under the alternating 
left-right procedure, if we get a blue left shoe at time 1, a red right 
shoe at time 2, and another blue left shoe at time 3, then we still have 
not collected a matching pair. 

The analog of Conjecture [TJ for the situation of a matching left-right 
pair of shoes under the constraint of equal distributions, is plausible: 

Conjecture 4. 

(28) sup£>(p,p) < 1. 

p 

Furthermore, we can even propose a value for the universal constant 



for shoes, given by the left side of (28). It comes from an analog of 
Theorem [2] This analog of Theorem [2] is easiest to understand without 
the constraint p = q. 

Theorem 3. For a, b G (0, oo) define 

(29) £(a, b) = - I" (ae- at + be~ bt - (a + b)e- {a+h)t ) e^ 2 dt. 



o 



1 + ab 

For a, b > and sufficiently large n, let 
(30) p (n) = p(n, a/^n), q (n) = q(n, bj y/n) 



as in (18). Then 

(31) lim Z>(p (n) ,q (ri) ) =l(a,b). 

n— >oo 

Proof. The argument closely follows the proof for Theorem [2] We omit 
details, apart from sketching the main differences: under the distribu- 



tions in (30), collecting left-right pairs with mean 1/y/n holding times 
between pairs, the left shoes of color form a rate a Poisson process, 
the right shoes of color form a rate b Poisson process; P(no left 
by time t) = e~ at , P(no right by time t) = e~ bt , and in the limit, 
the two processes are independent, so P(no left and no right by 
time t) = e~( a+6 )*. Inclusion-exclusion and differentiation leads to the 
limit density of the time To at which a left-right pair of color is 
found, f(t) = (ae~ at + 6e~ w - (a + 6)e~ (a+b) *), instead of the c 2 te~ ct 
of Theorem [2} At time t, for each of the n other colors we expect, 
asymptotically, tjyfn instances on the left, and t/^/n on the right, 
with t 2 /n for the asymptotic chance of having a pair. This leads to 
P(min(Ti, . . . , T n ) > t) -> exp(-t 2 ), instead of the exp(-t 2 /2) of The- 
orem |2j □ 
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Figure 4. Plot of £(a, a), the limit discrepancy D(p, q) 
when p = q = p(n,a/^/n). The maximum value 
0.19980867 . . . occurs at a = 1.562239 .... 




Figure 5. Plot of £(a, b), the limit discrepancy D(p, q) 
when p = p(n,a/ y/n) and q = p(n, b/y/n). The curve 
in Figure [4] lies along the diagonal, splitting the plot into 
two symmetric pieces. 



While we do not have evidence for the analog of Conjecture [2] - 
indeed, it seems daunting to deal with the analog of Section |5.2[ for 
left-right pairs under equal distribution for left and right — the analog 
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of Conjecture [I] combined with (23) is the following plausible conjecture. 
See Figure [4] for the source of the constant .1998 .... 

Conjecture 5. 

sup£>(p,p) = max!(a,a) = 0.199808674053. 
P a 
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